AITopics | data parallelism

Collaborating Authors

data parallelism

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Ouroboros: On Accelerating Training of Transformer-Based Language Models

Qian Yang, Zhouyuan Huo, Wenlin Wang, Lawrence Carin

Neural Information Processing SystemsFeb-11-2026, 14:53:09 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, arxiv preprint arxiv, transformer-based language model, (13 more...)

Neural Information Processing Systems

Country: North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

d01eeca8b24321cd2fe89dd85b9beb51-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 06:46:20 GMT

parallelism, piper, tensor parallelism, (15 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Hardware (0.68)

Add feedback

Piper: MultidimensionalPlanner forDNNParallelization

Neural Information Processing SystemsFeb-11-2026, 06:46:15 GMT

In the "modern era", such model-parallel training techniques trace their roots back to AlexNet [14] and early influential systems such as DistBelief [6] and Project Adam [3].

artificial intelligence, machine learning, parallelism, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.05)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.05)
North America > Canada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

a37d615b61f999a5fa276adb14643476-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 02:53:21 GMT

algorithm, bandwidth, parallelism, (15 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Virginia (0.05)
North America > United States > Oregon (0.05)
(8 more...)

Genre: Research Report (0.67)

Industry: Information Technology > Services (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

a37d615b61f999a5fa276adb14643476-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 02:53:17 GMT

algorithm, communication cost, parallelism, (14 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Virginia (0.05)
North America > United States > Oregon (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

SAPipe: Staleness-Aware Pipeline for Data Parallel DNN Training

Neural Information Processing SystemsDec-24-2025, 11:01:23 GMT

Data parallelism across multiple machines is widely adopted for accelerating distributed deep learning, but it is hard to achieve linear speedup due to the heavy communication. In this paper, we propose SAPipe, a performant system that pushes the training speed of data parallelism to its fullest extent.

data parallel dnn training, sapipe, staleness-aware pipeline, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)

Add feedback

ASAP: an Agentic Solution to Auto-optimize Performance of Large-Scale LLM Training

Ding, Yuran, Chen, Xinwei, Zhang, Xiaofan, Zhou, Zongwei

arXiv.org Artificial IntelligenceNov-7-2025

Optimizing large-language model (LLM) training on distributed domain-specific accelerator systems presents significant challenges due to its complex optimization space. Existing optimization methods, however, rely on time-consuming manual tuning or resource-intensive black-box searches, which struggle to keep pace with the rapidly evolving LLM domain, leading to slow development and underutilized resources. To address this, we introduce ASAP, an Agentic Solution to Auto-optimize Performance of Large-Scale LLM Training. It is a multi-agent system, featuring Coordinator, Analyzer, and Proposal agents, which integrates LLM reasoning with insights from performance profiling tools, roofline analysis, and a knowledge base of best practices and successful past optimizations from human experts. Our proposed design can automate the diagnosis of performance bottlenecks and recommend optimized sharding configurations with reasoning, thus effectively improving the efficiency of distributed LLM training. Experiments have shown that the ASAP-generated sharding configurations can contribute up to 28% training step time reduction and 1.43 times throughput improvement. When combined with additional optimization from human experts, throughput can be further increased to 2.58 times. The proposed ASAP promises to provide a scalable and explainable methodology for AI-assisted performance engineering in large-scale LLM training.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.03844

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Scaling Performance of Large Language Model Pretraining

Interrante-Grant, Alexander, Varela-Rosa, Carla, Narayan, Suhaas, Connelly, Chris, Reuther, Albert

arXiv.org Artificial IntelligenceOct-10-2025

Training these models is an extremely computationally expensive task; frontier Artificial Intelligence (AI) research companies are investing billions of dollars into supercomputing infrastructure to train progressively larger models on increasingly massive datasets. Unfortunately, very little information about the scaling performance and training considerations of these large training pipelines is released publicly. Working with very large datasets and models can be complex and practical recommendations are scarce in the public literature for tuning training performance when scaling up large language models. In this paper, we aim to demystify the large language model pretraining pipeline somewhat - in particular with respect to distributed training, managing large datasets across hundreds of nodes, and scaling up data parallelism with an emphasis on fully leveraging available GPU compute capacity. Index T erms--large language models, distributed training, data parallelism.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.05258

Country: North America > United States > Massachusetts > Middlesex County (0.15)

Genre: Research Report (0.66)

Industry:

Government > Regional Government (0.50)
Government > Military (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

Add feedback

Ouroboros: On Accelerating Training of Transformer-Based Language Models

Qian Yang, Zhouyuan Huo, Wenlin Wang, Lawrence Carin

Neural Information Processing SystemsOct-2-2025, 06:47:26 GMT

We also prove that our proposed algorithm is guaranteed to converge to critical points for non-convex problems.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

End-to-end RL Improves Dexterous Grasping Policies

Singh, Ritvik, Van Wyk, Karl, Abbeel, Pieter, Malik, Jitendra, Ratliff, Nathan, Handa, Ankur

arXiv.org Artificial IntelligenceSep-23-2025

This work explores techniques to scale up image-based end-to-end learning for dexterous grasping with an arm + hand system. Unlike state-based RL, vision-based RL is much more memory inefficient, resulting in relatively low batch sizes, which is not amenable for algorithms like PPO. Nevertheless, it is still an attractive method as unlike the more commonly used techniques which distill state-based policies into vision networks, end-to-end RL can allow for emergent active vision behaviors. We identify a key bottleneck in training these policies is the way most existing simulators scale to multiple GPUs using traditional data parallelism techniques. We propose a new method where we disaggregate the simulator and RL (both training and experience buffers) onto separate GPUs. On a node with four GPUs, we have the simulator running on three of them, and PPO running on the fourth. We are able to show that with the same number of GPUs, we can double the number of existing environments compared to the previous baseline of standard data parallelism. This allows us to train vision-based environments, end-to-end with depth, which were previously performing far worse with the baseline. We train and distill both depth and state-based policies into stereo RGB networks and show that depth distillation leads to better results, both in simulation and reality. This improvement is likely due to the observability gap between state and vision policies which does not exist when distilling depth policies into stereo RGB. We further show that the increased batch size brought about by disaggregated simulation also improves real world performance. When deploying in the real world, we improve upon the previous state-of-the-art vision-based results using our end-to-end policies.

artificial intelligence, arxiv, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.16434

Genre: Research Report (0.86)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Vision (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback